Skip to content

feat: add DM (Data Migration) support#6883

Draft
tennix wants to merge 6 commits into
pingcap:mainfrom
tennix:feature/dm-support
Draft

feat: add DM (Data Migration) support#6883
tennix wants to merge 6 commits into
pingcap:mainfrom
tennix:feature/dm-support

Conversation

@tennix
Copy link
Copy Markdown
Member

@tennix tennix commented May 9, 2026

Summary

  • Add CRD API types for DMGroup, DM, DMWorkerGroup, DMWorker with TLS support from the start
  • Add runtime/scope types and config packages (pkg/configs/dm, pkg/configs/dmworker) for dm-master and dm-worker
  • Add four controllers following the Group+Instance pattern:
    • DMGroup: manages dm-master replicas with embedded etcd bootstrap, headless + internal service, rolling update
    • DM: dm-master instance controller with health check (GET /api/v1/cluster/info), ConfigMap, DataVolume PVC, pod lifecycle
    • DMWorkerGroup: manages dm-worker replicas with headless service, maxSurge=0/maxUnavailable=1 for pod name stability
    • DMWorker: dm-worker instance controller with health check (GET /status), ConfigMap, RelayVolume PVC, pod lifecycle
  • Register all controllers in cmd/tidb-operator/main.go
  • Add example manifests examples/basic/07-dm.yaml and examples/basic/08-dmworker.yaml

Architecture notes

  • Bootstrap: dm-master uses AnnoKeyInitialClusterNum annotation on instances to compute initial-cluster; TaskBoot removes the annotation once all initial instances are ready. Bootstrap state is derived from existing instances (absence of annotation = bootstrapped).
  • dm-worker join: join address is derived from <DMGroupRef.Name>-dm-master internal service at the default DM port (8261).
  • Pod name stability: DMWorker uses maxSurge=0, maxUnavailable=1 to preserve pod names required by dm-master's lastBound etcd history.
  • No PD dependency: dm-master uses embedded etcd only; no PDClientManager needed.

Test plan

  • Apply examples/basic/07-dm.yaml — verify DMGroup creates headless + internal service and DM pods
  • Verify dm-master bootstrap: AnnoKeyInitialClusterNum annotation removed after all initial replicas ready
  • Apply examples/basic/08-dmworker.yaml — verify DMWorkerGroup creates headless service and DMWorker pods
  • Verify dm-worker joins dm-master via <dmgroup>-dm-master service
  • Scale DMGroup replicas 1→3 — verify join mode used for new instances (no boot annotation)
  • Scale DMWorkerGroup replicas 1→3 — verify rolling update with maxUnavailable=1
  • Verify TLS: enable spec.tlsCluster on Cluster, confirm TLS mounts appear in pods
  • Delete DMGroup — verify all DM instances and services are cleaned up

🤖 Generated with Claude Code

tennix and others added 6 commits May 8, 2026 17:23
Introduces DMGroup/DM (dm-master) and DMWorkerGroup/DMWorker (dm-worker)
CRDs following the existing Group+Instance pattern used by TiCDC, TiDB,
and TiKV. Adds ComponentDMMaster/ComponentDMWorker constants to the meta
package and regenerates deepcopy/register files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…worker

- Register DM and DMWorker in InstanceSet, DMGroup and DMWorkerGroup in GroupSet
- Add full Instance/Group interface implementations for DM master (DM, DMGroup)
- Add full Instance/Group interface implementations for DM worker (DMWorker, DMWorkerGroup)
- Add scope structs for DM and DMWorker with GVK, component, list helpers

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add VolumeMountType constants and default paths for DM master (data) and DM worker (relay-dir)
- Add container names, config dir paths, and cluster TLS dir paths to names.go
- Add DM/DMWorker port utility functions to coreutil
- Register DM container names in allMainContainers
- Add dm config package (master-addr, peer-urls, initial-cluster, join, TLS)
- Add dmworker config package (worker-addr, join, relay-dir, TLS)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add DMState, DMGroupState, DMSliceState, DMWorkerState interfaces to common
- Add DM image constant (pingcap/dm) to image package
- Implement dm-master instance controller with full task pipeline:
  - state: peer DM slice for initial-cluster computation
  - ctx: HTTP health check against /api/v1/cluster/info
  - cm: config map with initial-cluster or join address
  - pvc: DataVolume PVC + additional volumes
  - pod: dm-master container with ports 8261/8291, TLS support
  - status: sync MemberID and conditions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ses 5-8)

- Add DMGroup/DMWorker controllers with Group+Instance pattern
- DMGroup controller: headless+internal services, bootstrap annotation
  management, rolling update with maxSurge=0/maxUnavailable=1
- DM instance controller: configmap, PVC (DataVolume), pod, health check
  via /api/v1/cluster/info, status sync
- DMWorkerGroup controller: headless service only, no bootstrap sequence
- DMWorker instance controller: configmap, PVC (RelayVolume), pod, health
  check via /status, dm-master addr from DMGroupRef internal service
- Register all four controllers in main.go with field indexes and cache

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add basic cluster examples for a 1-replica dm-master cluster (DMGroup)
and a 1-replica dm-worker (DMWorkerGroup) pointing to it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented May 9, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot Bot requested a review from shonge May 9, 2026 07:17
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented May 9, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign cofyc for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@github-actions github-actions Bot added the v2 for operator v2 label May 9, 2026
@ti-chi-bot ti-chi-bot Bot added the size/XXL label May 9, 2026
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 1571 lines in your changes missing coverage. Please review.
✅ Project coverage is 35.16%. Comparing base (2b81667) to head (c45afd3).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6883      +/-   ##
==========================================
- Coverage   37.44%   35.16%   -2.28%     
==========================================
  Files         392      427      +35     
  Lines       22432    24054    +1622     
==========================================
+ Hits         8399     8458      +59     
- Misses      14033    15596    +1563     
Flag Coverage Δ
unittest 35.16% <0.00%> (-2.28%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants